





# INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

## VLSI IMPLEMENTATION OF AREA EFFICIENT FAST PARALLEL FIR DIGITAL FILTERS BASED ON FAST FIR ALGORITHM

### Sampath Kumar Dara\*, Dr. G. R. K. S. Subrahmanyam

\* Associate Professor, Department Of ECE, Vaageswari College Of Engineering ,Karimnagar,T.S, India Associate Professor, Department Of ECE, Indian Institute Of Space Science & Technology, Trivendrum, India

### DOI: 10.5281/zenodo.58639

## ABSTRACT

This paper proposes new parallel fir structures to diminish the equipment multifaceted nature of higher request Finite Impulse Response (FIR) channel with symmetric coefficients in view of Fast FIR Algorithms (FFAs). The goal is to plan a zone productive Fast Parallel Finite-Impulse Response (FIR) channel structure which requirement that the channel taps must be a different of 2 or 3. In this brief talked about for three parallel FIR Filter execution in view of recursively utilizing proposed 2 parallel FIR Structure. It misuses the characteristic way of Symmetric co-productive diminishing the quantity of Multipliers in further. The parallel FIR channel structure in light of proposed FFA systems has been executed taking into account Modified convey spare snake (MCSA) for further improvement. The diminishment in equipment unpredictability is accomplished by disposing of the cumbersome multiplier with a viper specifically MCSA. By and large, the proposed parallel FIR structures can prompt noteworthy equipment investment funds for symmetric coefficients from the current FFA parallel FIR structures, especially when the length of the channel is vast.

**KEYWORDS**: Symmetric Filter; Polyphase decomposed Fast Finite Impulse Response (FIR) Algorithm (FFA); Common Sub expression Elimination (CSE) ; Level Constrained Common Sub expression Elimination (LCCSE) Parallel FIR symmetric convolution; Very Large Scale Integration (VLSI); FFA technique.

### INTRODUCTION

The key territory of exploration in VLSI System Design is the region proficient fast information way rationale frameworks. Computerized Filters are a standout amongst the most generally utilized essential gadgets as a part of DSP frameworks, going from unstable development in mixed media signal handling to remote versatile correspondences. FIR channels are utilized as a part of high recurrence applications, as interactive media signal preparing, while some different applications require high throughput with a low-control circuit, for example, Multiple Input Multiple Output (MIMO) frameworks utilized as a part of cell remote correspondences. A higher request advanced channel is utilized video apparition canceller for telecast TV, it lessening the impact of multipath sign echoes. So higher request advanced channel is unavoidable. Range many-sided quality is improved by decreasing cumbersome multipliers from (2N - N/L) to L x N utilizing the FFA procedure. The Iterated Short Convolution (ISC) based straight convolution structure is transposed to get another equipment proficient. i.e., Small-sized separating structures recursively used to built huge channels in [6]-[10]. This paper is sorted out as takes after. A brief presentation of existing FFAs is appeared in Section II. In Section III, the proposed new parallel FIR channel structures are displayed. Area IV explores the intricacy examination of a current with proposed structures. In Section V, the portrayal of equipment execution and the trial results are appeared. At long last, segment VII portrays the conclusions and future work.

### **Quick FIR Algorithm (FFA)**

All in all the yield of a n-tap FIR channel which can be communicated as in (1)

http://www.ijesrt.com © International Journal of Engineering Sciences & Research Technology



[Dara\* et al., 5(7): July, 2016] IC<sup>TM</sup> Value: 3.00 N-1  $y(n) = \sum_{i=0}^{N-1} h(i) \cdot x(n-i), n = 0, 1, 2, ...., \infty$ (1)

ISSN: 2277-9655 Impact Factor: 4.116

where the information  $\{x(n)\}\$  is a limitless length info succession of the length N FIR channel coefficients. At that point, the conventional L-Parallel FIR channel can be communicated from polyphase decay as in [3].

$$\sum_{i=0}^{L-1} Y_{p}(z^{L}). \ z^{*p} = \sum_{q=0}^{L-1} X_{q}(z^{L})z^{*q} \sum_{r=0}^{L-1} H_{r}(z^{*L})z^{*r}$$
(2)

K=0

From the above condition demonstrates that the conventional FIR channel will require L2 Sub channel pieces of length N/L for its usage.

(1). Existing 2x2 FFA (Level of parallelism, L = 2):

By utilizing condition (2) a two parallel FIR channel can be depicted as,

$$Y0 + z - 1Y1 = (H0 + z - 1 H1) (X0 + z - 1 X1) = H0X0 + z - 1$$

(H0X1 + H1X0) + z-2X1H1 (3)

In the wake of disentangling this can be,

$$Y_0 = H_0 X_0 + z^{-2} X_1 H_1, Y_1 = H_0 X_1 + X_0 H_1$$
(4)

Condition (3) and (4) demonstrates the conventional two-parallel channel structure, which will require four length N/2 FIR sub channel pieces, two post handling adders, and absolutely 2N multipliers and (2 N - 2) adders. Be that as it may, (4) can be composed as,

$$Y_{0} = H_{0}X_{0} + z^{*2}X_{1}H_{1}$$
  

$$Y_{1} = (H_{0} + H_{1})(X_{0} + X_{1}) - H_{0}X_{0} - H_{1}X_{1}$$
(5)

This above condition (5) will take three FIR sub channel pieces of length N/2, one preprocessing and three post handling adders, and 3N/2 multipliers and 3(N/2 - 1) + 4 adders, which diminishes roughly one forward over the conventional two-parallel channel equipment cost from the condition (4). The Two-parallel (L=2) FIR channel execution utilizing acquired from (5) is appeared in Fig.1.



Fig. 1: Existing FFA for Two-parallel FIR filter implementation

Existing 3x3 FFA (Level of parallelism, L = 3):

By the similar approach, a three- Parallel FIR filter using FFA can be described as,

http://www.ijesrt.com



[Dara\* *et al.*, 5(7): July, 2016] IC<sup>TM</sup> Value: 3.00 Y0 = H0X0 - z- 3 X2H2 + z-3 [(H0 + H1) (X0 + X1) - H1X1] Y1 = [(H0 + H1) (X0 + X1) - H1X1] - (H0X0 - z - 3 X2H2)

Y2 = [(H0 + H1 + H)(X0 + X1 + X2)] - [(H0 + H1)(X0 + X1) - H1X1] - [(H1 + H2)(X1 + X2) - H1X1](6) This above equation (5) will take six length N/3FIR sub filter blocks, three preprocessing and seven post processing adders, and N multipliers and 2N + 4 adders, which further reduces approximately one third over the traditional three-parallel filter hardware cost from the equation(6). The Three-parallel (L=3) FIR filter implementation using obtained from (6) is shown in Fig.2



Fig. 2: Existing FFA for Three-parallel FIR filter implementation

## PROPOSED NEW PARALLEL STRUCTURES BASED ON FAST FIR ALGORITHM

The Main idea behind the proposed structure is based on Fast FIR Algorithm which utilizes the symmetric co-efficient, manipulating the poly phase decomposition to earn as many sub filter blocks as possible. The sub filter blocks containing symmetric coefficients, so the required number of multiplications is reduced to half in a single Sub filter block. It can be recursively used for the whole taps.

#### Proposed 2x2 FFA (Level of parallelism, L = 2):

From (4), a two – parallel FIR Filter can also be written as,  $Y_0 = \{ \frac{1}{2} [(H0 + H_1) (X_0 + X_1) + (H0 - H_1) (X_0 - X_1)] - H_1 X_1 \} + z^{*2} H_1 X_1$  $Y_1 = \frac{1}{2} [(H0 + H_1) (X0 - X_1) - (H0 - H_1) (X_0 - X_1)] \dots (7)$ 

Fig. 3: Proposed FFA for Two-parallel FIR filter

For an example Consider a 16 – tap FIR Filter with a set of symmetric coefficients applying to the proposed two-Parallel FIR filter as follows,

{ h(0),h(1),h(2),h(3),h(4),h(5),h(6),h(7),

h(8),h(9),h(10),h(11),h(12),h(13),h(14),h(15)}

where h(0) = h(15), h(1) = h(14), h(2) = h(13), h(3) = h(12), h(4) = h(11), h(5) = h(10), h(6) = h(9), h(7) = h(8),

applying to the proposed two parallel FIR Filter structure, and the top two sub filter blocks will be as,

 $\dot{H0} + \dot{H1} = \{ h(0) + \dot{h}(1), h(2) + h(3), h(4) + h(5), h(6) + h(7), h(8) + \dot{h}(9), h(10) + h(11), h(12) + h(13), h(14) + h(15) \}$ 

Where h(0) + h(1) = + h(14) + h(15), h(2) + h(3) = + h(12) + h(13), h(4) + h(5) = + h(10) + h(11), h(6) + h(7) = + h(8) + h(9) ......(8)

From this example, the proposed two parallel FIR Filter structure having two sub filter blocks out of three. i.e., H0 + H1 and H0 - H1 these two sub filter blocks are with symmetric coefficients now, as per equation (8). The sub filter block is realized in fig.4.



Fig. 3: Sub filter implementation with symmetric coefficients

© International Journal of Engineering Sciences & Research Technology

## ISSN: 2277-9655 Impact Factor: 4.116



[Dara\* et al., 5(7): July, 2016]

#### **ICTM Value: 3.00**

Proposed 3x3 FFA (Level of parallelism, L = 3):

By the comparative methodology of (7), a three-parallel FIR Filter can likewise be communicated by recursively utilizing the proposed 2x2 FFA.

**ISSN: 2277-9655** 

**Impact Factor: 4.116** 

 $Y01/2 = [(H0 + H1) (X0 + X1) + (H0 - H1) (X0 - X1)] - H1 X1 + z-3 {(H0+H1+H2) (X0 + X1 + X2)}$ 

-(H0 + H1)(X0 + X1) - [(H0 + H1)(X0 + X1) + (H0 - H1)(X0 - X1)] - H1 X1

 $Y11/2 = [(H0 + H1) (X0 + X1) - (H0 - H1) (X0 - X1)] + z-3 \{(H0 + H2) (X0 + X2) - (H0 - H2) (X0 - Y2) - (H0 - H2) (X0 - Y2) - (H0 - H2) (X0 - Y2) - (H0 - H2) (X0 - H2) (X0 - H2) - (H0 - H2) (X0 - H2) (X0 - H2) (X0 - H2) - (H0 - H2) - (H0$ 

 $[(H0+H1)(X0+X1) + (H0-H1)(X0-X1)] - H1X1\}$ 

 $Y2 \ 1/2 = \{ [(H0 + H2) (X0 + X2) - (H0 - H2) (X0 - X2)] \} + H1 \ X1 \ \dots \ \dots \ (9)$ 

This above condition (9) goes to an arrangement of even symmetric coefficients, which can gain four sub channel pieces containing symmetric coefficients from the aggregate of six sub channel obstructs in the current 3x3 parallel FFA Structure, The Proposed Three-parallel (L=3) FIR channel usage utilizing acquired from (9) is appeared in Fig.4.



Fig. 4: Proposed FFA for Three-parallel FIR filter



Fig. 5: Comparison of Sub filter blocks between existing FFA and the Proposed FFA for Three-parallel FIR filter Structures

From the above figure5. The shadow region in the proposed FFA stands for symmetric coefficients. Therefore, for an N –tap three –parallel FIR Filter, the proposed structure can earn N/3 multipliers from the existing FFA structure. However, this proposed three-parallel FIR structure also brings on overhead of seven additional adders in both preprocessing and post processing adders.

### PROPOSED CASCADING FFA (BASED ON SYMMETRIC COEFFICIENTS)

The larger block-sized filtering structures can be constructed through iterations of the small-sized filtering structures. The proposed parallel Cascading FFA structure enables the reuse of multipliers in parts of the sub-filter blocks but it also brings more adder cost in preprocessing and post processing blocks. When cascading the proposed FFA parallel FIR structures for larger parallel block factor L, the increase of adders can become larger. Due to this other than applying the proposed FFA structure to all decomposed sub filter block blocks, the existing FFA structures which

© International Journal of Engineering Sciences & Research Technology http://www.ijesrt.com [1180]



# [Dara\* et al., 5(7): July, 2016]

## ICTM Value: 3.00

ISSN: 2277-9655 Impact Factor: 4.116

have more compact operations in pre processing and post processing blocks are employed for those sub-filter blocks that contain no symmetric coefficients.



Fig. 6: Proposed FFA for Four-parallel FIR filter

## Proposed Canonical Sign Digit Representation (for Constant Symmetric coefficients multiplication)

The canonical signed digit (CSD) representation is one of the existing signed digit (SD) representations with unique features which make it useful in certain DSP applications focusing on low-power, efficient-area and high-speed arithmetic.



Fig7: Hardware Architecture of Binary to CSD Conversion

The CSD code is alter nary number system with the digit set{ 1 0 1},where 1 stands for 1.Given a constant, the corresponding CSD representation is unique and has two main properties:(1)the number of nonzero digits is minimal ,and (2) no two consecutive digits are both non zero, that is, two non zero digits are not adjacent. The first property implies a minimal Hamming weight, which leads to a reduction in the number of additions in arithmetic operations. The second property provides its uniqueness characteristic. For a constant CSD representation has proven to be useful for the design and implementation of digital filters such as the area-efficient program-mable FIR digital filter architecture in Chebyshev FIR filter can be design with some constrains in terms of hardware and frequency domain. Figure. 7 Shown the proposed CSD conversion from its equivalent binary number. An efficient implementation for CSD conversion is shown in Fig.8. The fast implementation of a CSD recoding is based on the binary associative property. This is developed for their prefix adders.

## **Design of Multiplier and Adder Blocks**

Finite-impulse response (FIR) digital filters are widely employed in digital signal processing due to their stability and linear-phase property. The FIR filter consists of many constant multiplications, in past have decomposed the multiplications into simple operations such as addition, subtraction and shift and have tried to share as many partial sums as possible to reduce the hardware complexity. As shown in Figure 8, all coefficients are considered as a whole to design all the Constant multiplications into a hardware block called a multiplier block. There are two metrics that are important in the design and comparison of digital FIR filters. The first one is the adder cost, which is the number



[Dara\* *et al.*, 5(7): July, 2016] IC<sup>TM</sup> Value: 3.00

## ISSN: 2277-9655 Impact Factor: 4.116

of adders required to implement a given set of filter coefficients, and the second is the adder step, which is the number of adders passing through the critical path. Many algorithms have been proposed to reduce the adder cost and adder step, which can be categorized into two groups: 1) dependence-graph and 2) common sub expression elimination (CSE) algorithms. Since the FIR filter consists of many constant multiplications, earlier works have decomposed the multiplications into simple operations such as addition, subtraction and shift and have tried to share the partial sums as many as possible in order to reduce the hardware complexity.



Fig.8: Replacing Constant Multiplication by Multiplier

### Block in Sub filter Block of symmetric Convolution

Example 1: Let Coefficient Set As  $\{3, 13, 219, 221\}$ Consider X=1001, h0=11, X\*h0 = 1001 \* 11 1 0 0 1 ------- X 1 0 0 1 0 ------- X <br/>(+) X\*h0= X + X << 1 Multiplier Block is being replaced by Multiplier. The Multiplier block is simply nothing but only Adders. So the Add/Shift Implementation is consists of only Adders and Shifts. 3\*X = 0011 \* X = X + X << 1 13\*X = 1101 \* X = X + X << 2 + X << 3 + X << 4 + X << 6 + X << 7 221 \* X = 1101 1101 \* X = X + X << 2 + X << 3 + X << 4 + X << 6 + X <<7

In the above coefficient set {3, 13,219,221} can be implemented

#### COMPLEXITY ANALYSIS OF AN EXISTING WITH PROPOSED STRUCTURES

Whenever the L-parallel FIR filter comes with a set of symmetric coefficients of length N, the number of required Multipliers for the proposed parallel FIR filter structures is given by,

$$\frac{N}{\prod_{i=1}^{r}L_{i}} \underset{\text{is even,}}{M} = \frac{N}{\prod_{i=1}^{r}L_{i}} \left( \prod_{i=1}^{r}M_{i} - \frac{s}{2} \right)$$

$$(10)$$

$$\frac{N}{\prod_{i=1}^{r}L_{i}} \underset{\text{is odd,}}{M} = \frac{N}{\prod_{i=1}^{r}L_{i}} \prod_{i=1}^{r}M_{i} - \frac{s}{2} \left( \frac{N}{\prod_{i=1}^{r}L_{i}} - 1 \right)$$

$$(11)$$



#### [Dara\* *et al.*, 5(7): July, 2016] IC<sup>TM</sup> Value: 3.00

# ISSN: 2277-9655 Impact Factor: 4.116

Where, Li is the small parallel block size such as (2x2) or (3x3) FFA. r is the number of FFAs used. Mr is the number of sub filter blocks used from i-th FFA. Where, S is the number of sub filter blocks containing symmetric coefficients. The number of the required adders in sub filter section can be given by,

$$A_{Sub} = \prod_{i=1}^{r} M_i \left( \frac{N}{\prod_{i=1}^{r} L_i} - 1 \right)$$

## IMPLEMENTATION AND EXPERIMENTAL RESULTS

The proposed FFA structures and the existing FFA structures are implemented in VHDL with filter length of 16 and 72, word length of 8-bit and 16-bit, respectively. Two sets of Ideal Low pass FIR filter symmetric coefficients of length 16 and 72 are generated by MATLAB using Remez Exchange algorithm. The sub filters are based on Canonical Sign digit (CSD) structure and Reconfigurable Carry Save adder (RCSA) introduced in [7] are used. Figures 8,9,10 and 9 shows the output results of an 16 tap FIR filter implemented in VHDL Language on Xilinx FPGA SPARTAN 3E kit. The implementation of Remez Linear phase FIR filter of order 72 is done for the specifications characterizing an Equiripple FIR design method for a low pass response type and filter structure is Direct form symmetric FIR.

#### **Experimental Results**



Fig.9: Magnitude Response of Linear Phase Fir Filter of Order 72 with Symmetry Co-Efficient



Fig.10.Output Wave form of proposed three - Parallel 72 tap FIR Filter. (8 bit)



# [Dara\* et al., 5(7): July, 2016]

ICTM Value: 3.00

## ISSN: 2277-9655 Impact Factor: 4.116

This above results shows the proposed 3X3 parallel linear phase fir filter of order 72 with symmetric co-efficient in terms of Area, Power and critical path delay. Area can be calculated with the help of number of gates used in Field Programmable Gate Arrays. Also the power utilization is measured with the help of Xilinx 9.2i under the various power consumption levels like static, dynamic and leakage power consumption. Power consumption is depends on area utilized. i.e., number of gates to be used. As the above both simulation and synthesis results clearly shows, the circuit complexity in terms of area, power and delay of 72 order Linear phase proposed parallel FIR Digital Filters as listed below in Table II, and also comparison of various sub-filter methods of proposed for linear phase fir filter structures under the different level of parallelism. Validated the proposed techniques on Spartan 3E device where the significant area is observed and power reductions in both Common Sub expressions elimination (CSE) and Level Constrained Common Sub expression Elimination (LCCSE) techniques. Comparing the Proposed methods in sub-filter sections with conventional methods,

- 1. Using proposed LCCSE algorithm
- a) Power consumption 68.768% reduced
- b) Area 75.506% reduced
- c) Delay 30.95% reduced
- 2. Using CSE algorithm
- a) Power consumption 24.354% reduced
- b) Area 36.433% reduced
- c) Delay 20.4% reduced



Fig.11. RTL Schematic View of proposed three- parallel 72 tap FIR Filter (8bit)

## **CONCLUSION & FUTURE WORK**

The new parallel FIR channel structures has proposed, which are helpful to symmetric convolutions when the quantity of taps is the numerous of 2 or 3.(i.e., Even length based). Multipliers are the real divides in equipment utilization for the parallel FIR channel execution. The proposed new structure misuses the way of even symmetric coefficients and recovery a lot of multipliers to the detriment of extra adders. Since multipliers exceed adders in equipment cost, it is productive to trade multipliers with adders. Subsequently, the bigger the length of FIR channels is, the more the proposed structures can spare from the current FFA structures, concerning the equipment cost. The new parallel FIR structures comprising of favorable circumstances in both polyphase disintegrations managing symmetric convolutions similarly superior to the current FFA structures as far as equipment utilization.

The proposed new parallel FFA Structures For FIR channels are a range effective. Since, around 40% of the zone is spared with this system when contrasted with traditional and existing FFA FIR channel plan. Region effectiveness and rapid is accomplished with new parallel FFA strategy at exceptionally slight expense of force utilization for expansive tap FIR channel. Since, Parallel FIR channels are a zone productive and contained less postpone, so these channels can be utilized as a part of different applications, for example, beat forming FIR channel in WCDMA framework, programming outline radio and flag preparing framework for rapid. In future this work can be reached out for outlining Odd length based FIR channels with less equipment unpredictability.



# [Dara\* et al., 5(7): July, 2016] ICTM Value: 3.00

## **ISSN: 2277-9655 Impact Factor: 4.116**

## REFERENCES

- [1] Yu-Chi Tsao and Ken Choi, "Area-Efficient Parallel FIR Digital Filter Structures for Symmetric Convolutions Based on Fast FIR Algorithm" IEEE Transactions on Veri Large Scale Integration (VLSI) Systems, vol. 20, no. 2, pp. 366–371, Feb. 2012.
- [2] C. Cheng and K. K. Parhi, "Low-cost parallel FIR structures with 2-stage parallelism," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 2, pp. 280–290, Feb. 2007.
- [3] C. Cheng and K. K. Parhi, "Furthur complexity reduction of parallel FIR filters," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS 2005), Kobe, Japan, May 2005.
- [4] C. Cheng and K. K. Parhi, "Hardware efficient fast parallel FIR filter structures based on iterated short convolution," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 8, pp. 1492-1500, Aug. 2004.
- [5] J. G. Chung and K. K. Parhi, "Frequency-spectrum-based low-area low-power parallel FIR filter design," EURASIP J. Appl. Signal Process., vol. 2002, no. 9, pp. 444–453, 2002.
- [6] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999.
- [7] D. A. Parker and K. K. Parhi, "Low-area/power parallel FIR digital filter implementations," J. VLSI Signal Process. Syst., vol. 17, no. 1, pp. 75–92, 1997.
- [8] B.RAmkumar, Harish M Kittur, P.Mahesh kannan "ASIC Implementation of Modified Carry Save Adder" European Journal of Scientific Research ISSN 1450-216X Vol.42 No.1(2010),pp.53-58.
- [9] Z.-J. Mou and P. Duhamel, "Short-length FIR filters and their use in fast nonrecursive filtering," IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1322–1332, Jun. 1991.
- [10] J.I.Acha, "Computational structures for fast implementation of L-path and L-block digital filters," IEEE Trans. Circuit Syst., vol. 36, no. 6, pp. 805-812, Jun.1989.
- [11] K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999.
- [12] A. Dempster et al., "Designing multiplier blocks with low logic depth," in Proc. ISCAS, May 2002, vol. 6, pp. 773–776.
- [13] S. Vijay et al., "A greedy common subexpression elimination algorithm for implementing FIR filters," in Proc. ISCAS, May 2007, pp. 3461-3464.